Detecting hospital-acquired infections: A document classification approach using support vector machines and gradient tree boosting
نویسندگان
چکیده
Hospital-acquired infections pose a significant risk to patient health, while their surveillance is an additional workload for hospital staff. Our overall aim is to build a surveillance system that reliably detects all patient records that potentially include hospital-acquired infections. This is to reduce the burden of having the hospital staff manually check patient records. This study focuses on the application of text classification using support vector machines and gradient tree boosting to the problem. Support vector machines and gradient tree boosting have never been applied to the problem of detecting hospital-acquired infections in Swedish patient records, and according to our experiments, they lead to encouraging results. The best result is yielded by gradient tree boosting, at 93.7 percent recall, 79.7 percent precision and 85.7 percent F1 score when using stemming. We can show that simple preprocessing techniques and parameter tuning can lead to high recall (which we aim for in screening patient records) with appropriate precision for this task.
منابع مشابه
Detection of Hospital Acquired Infections in sparse and noisy Swedish patient records
Hospital Acquired Infections (HAI) pose a significant risk on patients’ health while their surveillance is an additional work load for hospital medical staff and hospital management. Our overall aim is to build a system which reliably retrieves all patient records which potentially include HAI, to reduce the burden of manually checking patient records by the hospital staff. In other words, we e...
متن کاملClassification of Arrhythmia Using Conjunction of Machine Learning Algorithms and ECG Diagnostic Criteria
This paper proposes a classification technique using conjunction of Machine Learning Algorithms and ECG Diagnostic Criteria which improves the accuracy of detecting Arrhythmia using Electrocardiogram (ECG) data. ECG is the most widely used first line clinical instrument to record the electrical activities of the heart. The data-set from UC Irvine (UCI) Machine Learning Repository was used to im...
متن کاملA Hybrid Random Forest based Support Vector Machine Classification supplemented by boosting
This paper presents an approach to classify remote sensed data using a hybrid classifier. Random forest, Support Vector machines and boosting methods are used to build the said hybrid classifier. The central idea is to subdivide the input data set into smaller subsets and classify individual subsets. The individual subset classification is done using support vector machines classifier. Boosting...
متن کاملImproving reservoir rock classification in heterogeneous carbonates using boosting and bagging strategies: A case study of early Triassic carbonates of coastal Fars, south Iran
An accurate reservoir characterization is a crucial task for the development of quantitative geological models and reservoir simulation. In the present research work, a novel view is presented on the reservoir characterization using the advantages of thin section image analysis and intelligent classification algorithms. The proposed methodology comprises three main steps. First, four classes of...
متن کاملA data mining framework for detecting subscription fraud in telecommunication
Service providing companies including telecommunication companies often receive substantial damage from customers’ fraudulent behaviors. One of the common types of fraud is subscription fraud in which usage type is in contradiction with subscription type. This study aimed at identifying customers’ subscription fraud by employing data mining techniques and adopting knowledge discovery process. T...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 24 شماره
صفحات -
تاریخ انتشار 2018